Overview

Dataset statistics

Number of variables4
Number of observations2441666
Missing cells0
Missing cells (%)0.0%
Duplicate rows392002
Duplicate rows (%)16.1%
Total size in memory74.5 MiB
Average record size in memory32.0 B

Variable types

NUM4

Reproduction

Analysis started2020-03-29 20:03:48.959259
Analysis finished2020-03-29 23:45:19.645671
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
Dataset has 392002 (16.1%) duplicate rows Duplicates
FRECUENCY is highly skewed (γ1 = 29.65483338) Skewed
MONETARY is highly skewed (γ1 = 530.5060113) Skewed
AVGMONETARY is highly skewed (γ1 = 368.8070561) Skewed

Variables

RECENCY
Real number (ℝ≥0)

Distinct count1845
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean511.2430815680769
Minimum3
Maximum1847
Zeros0
Zeros (%)0.0%
Memory size18.6 MiB

Quantile statistics

Minimum3
5-th percentile13
Q187
median323
Q3833
95-th percentile1537
Maximum1847
Range1844
Interquartile range (IQR)746

Descriptive statistics

Standard deviation499.7476266
Coefficient of variation (CV)0.9775146982
Kurtosis-0.302109578
Mean511.2430816
Median Absolute Deviation (MAD)420.2925659
Skewness0.9184276292
Sum1248284850
Variance249747.6903
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 3. 3.5 4.5 5.5 6.5 ... 1837.5 1841.5 1843.5 1844.5 1847. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
5 18389 0.8%
 
7 18283 0.7%
 
6 16820 0.7%
 
16 15659 0.6%
 
14 15141 0.6%
 
33 14668 0.6%
 
8 13846 0.6%
 
20 13165 0.5%
 
19 13048 0.5%
 
9 12977 0.5%
 
Other values (1835) 2289670 93.8%
 
ValueCountFrequency (%) 
3 420 < 0.1%
 
4 2298 0.1%
 
5 18389 0.8%
 
6 16820 0.7%
 
7 18283 0.7%
 
ValueCountFrequency (%) 
1847 147 < 0.1%
 
1846 248 < 0.1%
 
1845 277 < 0.1%
 
1844 187 < 0.1%
 
1843 467 < 0.1%
 

FRECUENCY
Real number (ℝ≥0)

SKEWED
Distinct count911
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.534448200531932
Minimum1
Maximum6823
Zeros0
Zeros (%)0.0%
Memory size18.6 MiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median3
Q38
95-th percentile39
Maximum6823
Range6822
Interquartile range (IQR)7

Descriptive statistics

Standard deviation27.01969983
Coefficient of variation (CV)2.833902839
Kurtosis3425.861139
Mean9.534448201
Median Absolute Deviation (MAD)10.72564128
Skewness29.65483338
Sum23279938
Variance730.0641788
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.000e+00 1.500e+00 2.500e+00 3.500e+00 4.500e+00 ... 8.530e+02 1.105e+03 1.772e+03 3.577e+03 6.823e+03], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1 857451 35.1%
 
2 348301 14.3%
 
3 203692 8.3%
 
4 140380 5.7%
 
5 104092 4.3%
 
6 82318 3.4%
 
7 67081 2.7%
 
8 55950 2.3%
 
9 47404 1.9%
 
10 40677 1.7%
 
Other values (901) 494320 20.2%
 
ValueCountFrequency (%) 
1 857451 35.1%
 
2 348301 14.3%
 
3 203692 8.3%
 
4 140380 5.7%
 
5 104092 4.3%
 
ValueCountFrequency (%) 
6823 1 < 0.1%
 
4563 1 < 0.1%
 
4309 1 < 0.1%
 
3600 1 < 0.1%
 
3554 1 < 0.1%
 

MONETARY
Real number (ℝ≥0)

SKEWED
Distinct count469481
Unique (%)19.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean37852.1625447625
Minimum0.0
Maximum3971862682.06
Zeros151
Zeros (%)< 0.1%
Memory size18.6 MiB

Quantile statistics

Minimum0
5-th percentile1
Q120.65
median126.08
Q3696.9875
95-th percentile17200
Maximum3971862682
Range3971862682
Interquartile range (IQR)676.3375

Descriptive statistics

Standard deviation4231783.144
Coefficient of variation (CV)111.7976586
Kurtosis395786.1512
Mean37852.16254
Median Absolute Deviation (MAD)70736.5076
Skewness530.5060113
Sum9.242233831e+10
Variance1.790798857e+13
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.00000000e+00 5.00000000e-03 1.50000000e-02 2.50000000e-02 3.50000000e-02 ... 4.21185316e+07 8.18541412e+07 1.83170762e+08 6.11292195e+08 3.97186268e+09], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1 56019 2.3%
 
100 35227 1.4%
 
200 33104 1.4%
 
20 28916 1.2%
 
40 16833 0.7%
 
60 13966 0.6%
 
10 12243 0.5%
 
800 11602 0.5%
 
2 11393 0.5%
 
0.02 11186 0.5%
 
Other values (469471) 2211177 90.6%
 
ValueCountFrequency (%) 
0 151 < 0.1%
 
0.01 7594 0.3%
 
0.02 11186 0.5%
 
0.03 1311 0.1%
 
0.04 1084 < 0.1%
 
ValueCountFrequency (%) 
3971862682 1 < 0.1%
 
2189348945 1 < 0.1%
 
2181740452 1 < 0.1%
 
1509163532 1 < 0.1%
 
1207394588 1 < 0.1%
 

AVGMONETARY
Real number (ℝ≥0)

SKEWED
Distinct count799770
Unique (%)32.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1632.0347265716673
Minimum0.0
Maximum60613785.83
Zeros151
Zeros (%)< 0.1%
Memory size18.6 MiB

Quantile statistics

Minimum0
5-th percentile0.92
Q18.330866935
median37
Q3157
95-th percentile2336.824607
Maximum60613785.83
Range60613785.83
Interquartile range (IQR)148.6691331

Descriptive statistics

Standard deviation86711.38196
Coefficient of variation (CV)53.1308437
Kurtosis188563.2826
Mean1632.034727
Median Absolute Deviation (MAD)2844.801142
Skewness368.8070561
Sum3984883703
Variance7518863762
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.00000000e+00 5.00000000e-03 1.01562500e-02 1.45000000e-02 1.55000000e-02 ... 1.10265705e+06 2.25993069e+06 5.00003692e+06 1.32426018e+07 6.06137858e+07], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1 62254 2.5%
 
100 36197 1.5%
 
200 33216 1.4%
 
20 29517 1.2%
 
9.99 19363 0.8%
 
40 16090 0.7%
 
11.99 15497 0.6%
 
60 14070 0.6%
 
10 12785 0.5%
 
800 11410 0.5%
 
Other values (799760) 2191267 89.7%
 
ValueCountFrequency (%) 
0 151 < 0.1%
 
0.01 8084 0.3%
 
0.0103125 1 < 0.1%
 
0.01057692308 1 < 0.1%
 
0.011 1 < 0.1%
 
ValueCountFrequency (%) 
60613785.83 1 < 0.1%
 
50000000 1 < 0.1%
 
35890966.31 1 < 0.1%
 
33090909.09 1 < 0.1%
 
30597459.46 1 < 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

RECENCYFRECUENCYMONETARYAVGMONETARY
0209722,945.940.9
183131806,000.026,000.0
25732907.4453.7
32262500.4250.2
414008371,042.546,380.3
552311,115.11,115.1
6141221.710.9
7339912,422,280.0125,477.6
8511408,055,781.557,541.3
98338465.758.2

Last rows

RECENCYFRECUENCYMONETARYAVGMONETARY
2441656181811,760.21,760.2
2441657182613,561.83,561.8
2441658182016,701.66,701.6
2441659182918,789.68,789.6
2441660184121,411.0705.5
244166118271206.0206.0
2441662184111,081.01,081.0
24416631819111,995.011,995.0
2441664182229,875.94,938.0
24416651836250,937.525,468.7